Mean Square Residue Biclustering with Missing Data and Row Inversions

نویسندگان

  • Stefan Gremalschi
  • Gulsah Altun
  • Irina Astrovskaya
  • Alex Zelikovsky
چکیده

Cheng and Church proposed a greedy deletion-addition algorithm to find a given number of k biclusters, whose mean squared residues (MSRs) are below certain thresholds and the missing values in the matrix are replaced with random numbers. In our previous paper we introduced the dual biclustering method with quadratic optimization to missing data and row inversions. In this paper, we modified the dual biclustering method with quadratic optimization and added three new features. First, we introduce ”row status” for each row in a bicluster where we add and also delete rows from biclusters based on their status in order to find min MSR. We compare our results with Cheng and Church’s approach where they inverse rows while adding them to the biclusters. We select the row or the negated row not only at addition, but also at deletion and show improvement. Second, we give a prove for the theorem introduced by Cheng and Church in [4]. Since, missing data often occur in the given data matrices for biclustering, usually, missing data are filled by random numbers. However, we show that ignoring the missing data is a better approach and avoids additional noise caused by randomness. Since, an ideal bicluster is a bicluster with an H value of zero, our results show a significant decrease of H value of the biclusters with lesser noise compared to original dual biclustering and Cheng and Church method.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mining 3D Patterns from Gene Expression Temporal Data: A New Tricluster Evaluation Measure

Microarrays have revolutionized biotechnological research. The analysis of new data generated represents a computational challenge due to the characteristics of these data. Clustering techniques are applied to create groups of genes that exhibit a similar behavior. Biclustering emerges as a valuable tool for microarray data analysis since it relaxes the constraints for grouping, allowing genes ...

متن کامل

Random walk biclustering for microarray data

A biclustering algorithm, based on a greedy technique and enriched with a local search strategy to escape poor local minima, is proposed. The algorithm starts with an initial random solution and searches for a locally optimal solution by successive transformations that improve a gain function. The gain function combines the mean squared residue, the row variance, and the size of the bicluster. ...

متن کامل

Gene Expression Biclustering Using Random Walk Strategies

A biclustering algorithm, based on a greedy technique and enriched with a local search strategy to escape poor local minima, is proposed. The algorithm starts with an initial random solution and searches for a locally optimal solution by successive transformations that improve a gain function, combining the mean squared residue, the row variance, and the size of the bicluster. Different strateg...

متن کامل

bicACO: An Ant Colony Inspired Biclustering Algorithm

A recent proposal developed to avoid some of the drawbacks presented by standard clustering algorithms is the so-called biclustering technique [1], which performs clustering of rows and columns of the data matrix simultaneously, allowing the extraction of additional information from the dataset. Since the biclustering problem is combinatorial, and ant-based systems present several advantages wh...

متن کامل

Predicting missing values with biclustering: A coherence-based approach

In this work, a novel biclustering-based approach to data imputation is proposed. This approach is based on the Mean Squared Residue metric, used to evaluate the degree of coherence among objects of a dataset, and presents an algebraic development that allows the modeling of the predictor as a quadratic programming problem. The proposed methodology is positioned in the field of missing data, it...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009